\chapter{Stopped martingales (TO DO)}
\label{ch:martingales}
\section{How to make money almost surely}
We now take our newfound knowledge of measure theory to a casino.

Here's the most classical example that shows up:
a casino lets us play a game where we can bet any amount of
on a fair coin flip, but with bad odds:
we win $\$n$ if the coin is heads,
but lose $\$2n$ if the coin is tails,
for a value of $n$ of our choice.
This seems like a game that no one in their right mind would want to play.

Well, if we have unbounded time and money,
we actually can almost surely make a profit.
\begin{example}
	[Being even greedier than 18th century France]
	In the game above, we start by betting $\$1$.
	\begin{itemize}
		\ii If we win, we leave having made $\$1$.
		\ii If we lose, we then bet $\$10$ instead, and
		\begin{itemize}
			\ii If we win, then we leave having made $\$10-\$2=\$8$, and
			\ii If we lose then we bet $\$100$ instead, and
			\begin{itemize}
				\ii If we win, we leave having made $\$1000-\$20-\$2=\$978$, and
				\ii If we lose then we bet $\$1000$ instead, and so on\dots
			\end{itemize}
		\end{itemize}
	\end{itemize}
	Since the coin will almost surely show heads eventually,
	we make money whenever that happens.
	In fact, the expected amount of time until a coin shows heads
	is only $2$ flips! What could go wrong?
\end{example}
This chapter will show that under sane conditions
such as ``finite time'' or ``finite money'',
one cannot actually make money in this way --- the \emph{optional stopping theorem}.
This will give us an excuse to define conditional probabilities,
and then talk about martingales (which generalize the fair casino).

Once we realize that trying to extract money from Las Vegas is a lost cause,
we will stop gambling and then return to solving math problems,
by showing some tricky surprises,
where problems that look like they have nothing to do with gambling
can be solved by considering a suitable martingale.

In everything that follows, $\Omega = (\Omega, \SA, \mu)$ is a probability space.

\section{Sub-$\sigma$-algebras and filtrations}
\prototype{$\sigma$-algebra generated by a random variable, and coin flip filtration.}
We considered our $\Omega$ as a space of worlds,
equipped with a $\sigma$-algebra $\SA$ that lets us integrate over $\Omega$.
However, it is a sad fact of life that at any given time,
you only know partial information about the world.
For example, at the time of writing,
we know that the world did not end in 2012
(see \url{https://en.wikipedia.org/wiki/2012_phenomenon}),
but the fate of humanity in future years remains at slightly uncertain.

Let's write this measure-theoretically: we could consider
\begin{align*}
	\Omega &= A \sqcup B \\
	A &= \left\{ \omega \text{ for which world ends in $2012$} \right\} \\
	B &= \left\{ \omega \text{ for which world does not end in $2012$} \right\}.
\end{align*}
We will assume that $A$ and $B$ are measurable sets,
that is, $A,B \in \SA$.
That means we could have good fun arguing about what the values
of $\mu(A)$ and $\mu(B)$ should be
(``a priori probability that the world ends in $2012$''),
but let's move on to a different silly example.


We will now introduce a new notion that
we will need when we define conditional probabilities later.
\begin{definition}
	Let $\Omega = (\Omega, \SA, \mu)$ be a probability space.
	A \vocab{sub-$\sigma$-algebra} $\SF$
	on $\Omega$ is exactly what it sounds like:
	a $\sigma$-algebra $\SF$ on the set $\Omega$
	such that each $A \in \SF$ is measurable
	(i.e.,\ $\SF \subseteq \SA$).
\end{definition}

The motivation is that $\SF$ is the $\sigma$-algebra of sets
which let us ask questions about some piece of information.
For example, in the 2012 example we gave above,
we might take $\SF = \{\varnothing, A, B, \Omega\}$,
which are the sets we care about if we are thinking only about 2012.

Here are some more serious examples.
\begin{example}
	[Examples of sub-$\sigma$-algebras]
	\listhack
	\begin{enumerate}[(a)]
		\ii Let $X \colon \Omega \to \{0,1,2\}$ be a random
		variable taking on one of three values.
		If we're interested in $X$ then we could define
		\begin{align*}
			A &= \{\omega \mid X(\omega) = 1\} \\
			B &= \{\omega \mid X(\omega) = 2\} \\
			C &= \{\omega \mid X(\omega) = 3\}
		\end{align*}
		then we could write
		\[ \SF = \left\{ \varnothing, \; A, \; B, \; C, \;
				A \cup B, \; B \cup C, \; C \cup A, \; \Omega \right\}. \]
		This is a sub-$\sigma$-algebra on $\SF$
		that lets us ask questions about $X$
		like ``what is the probability $X \neq 3$'', say.

		\ii Now suppose $Y \colon \Omega \to [0,1]$ is another random variable.
		If we are interested in $Y$,
		the $\SF$ that captures our curiosity is
		\[ \SF = \left\{ Y\pre(B) \mid B \subseteq [0,1]
			\text{ is measurable } \right\}. \]
	\end{enumerate}
\end{example}

You might notice a trend here which we formalize now:
\begin{definition}
	Let $X \colon \Omega \to \RR$ be a random variable.
	The \vocab{sub-$\sigma$-algebra generated by $X$} is defined by
	\[ \sigma(X) \defeq \left\{ X\pre(B) \mid B \subseteq \RR
		\text{ is measurable } \right\}. \]
	If $X_1$, \dots\ is a sequence (finite or infinite) of random variables,
	the sub-$\sigma$-algebra generated by them
	is the smallest $\sigma$-algebra which contains $\sigma(X_i)$ for each $i$.
\end{definition}

Finally, we can put a lot of these together ---
since we're talking about time, we learn more as we grow older,
and this can be formalized.
\begin{definition}
	A \vocab{filtration} on $\Omega = (\Omega, \SA, \mu)$
	is a nested sequence\footnote{For convenience,
		we will restrict ourselves to $\ZZ_{\ge0}$-indexed
		filtrations, though really any index set is okay.}
	\[ \SF_0 \subseteq \SF_1 \subseteq \SF_2 \subseteq \dots \]
	of sub-$\sigma$-algebras on $\Omega$.
\end{definition}

\begin{example}
	[Filtration]
	Suppose you're bored in an infinitely long class
	and start flipping a fair coin to pass the time.
	(Accordingly, we could let $\Omega = \{H,T\}^\infty$
	consist of infinite sequences of heads $H$ and tails $T$.)
	We could let $\SF_n$ denote the sub-$\sigma$-algebra
	generated by the values of the first $n$ coin flips.
	So:
	\begin{itemize}
		\ii $\SF_0 = \{\varnothing, \Omega\}$,
		\ii $\SF_1 = \{\varnothing, \text{first flip $H$}, \text{first flip $T$}, \Omega\}$,
		\ii $\SF_2 = \{\varnothing, \text{first flips $HH$}, \text{second flip $T$}, \Omega, \text{first flip and second flip differ}, \dots\}$.
		\ii and so on, with $\SF_n$ being the measurable sets
		``determined'' only by the first $n$ coin flips.
	\end{itemize}
\end{example}

\begin{exercise}
	In the previous example, compute the cardinality $|\SF_n|$ for each integer $n$.
\end{exercise}

More importantly,
\begin{moral}
	$X$ is $\SF$-measurable if $X$ is determined only by the information given in $\SF$.
\end{moral}
\begin{example}
	In the example above, let $X_3$ be the value of the third coin flip. Then:
	\begin{itemize}
		\ii $X_3$ is not $\SF_2$-measurable. (That is, we don't know $X_3$ from the knowledge of the
		first $2$ coin flips.)
		\ii But it is $\SF_3$-measurable.
	\end{itemize}
\end{example}
\begin{exercise}
	Check this! (Recall that a function is measurable if it lifts open sets to measurable sets.
	So you need to show e.g. $X_3\pre(\{H\}) \notin \SF_2$.)
\end{exercise}
So, not only can we formalize partial information about the world, we can also formalize what it
means for something to only depend on that partial information.

\section{Conditional expectation}
\prototype{$\EE(X \mid X+Y)$ for $X$ and $Y$ distributed over $[0,1]$.}

We'll need the definition of conditional probability to define a martingale,
but this turns out to be surprisingly tricky.
Let's consider the following simple example to see why.
\begin{example}
	[Why high-school methods aren't enough here]
	Suppose we have two independent random variables $X$, $Y$ distributed
	uniformly over $[0,1]$ (so we may as well take $\Omega = [0,1]^2$).
	We might try to ask the question:
	\begin{quote}
		\itshape
		``what is the expected value of $X$
		given that $X+Y = 0.6$''?
	\end{quote}
	Intuitively, we know the answer has to be $0.3$.
	However, if we try to write down a definition, we quickly run into trouble.
	Ideally we want to say something like
	\[ \EE[X \text{ given } X+Y=0.6]
		= \frac{\int_{S} X}{\int_{S} 1}
		\text{ where }
		S = \left\{ \omega \in \Omega \mid X(\omega)+Y(\omega)=0.6 \right\}.  \]
	The problem is that $S$ is a set of measure zero,
	so we quickly run into $\frac 00$, meaning a definition
	of this shape will not work out.
\end{example}

The way that this is typically handled in measure theory
is to use the notion of sub-$\sigma$-algebra that we defined.

But first, we should explain what $\EE(X \mid X+Y)$ means first --- why are we conditioning on
another random variable instead of an event?

To motivate conditioning on a random variable, consider the following situation.
Suppose that the weather tomorrow depends on the weather today and the random fluctuations.
So we may have statements such as:
\begin{align*}
	\Pr(\text{it rains tomorrow}\mid \text{it rains today}) &= 0.6, \\
	\Pr(\text{it rains tomorrow}\mid \text{it doesn't rain today}) &= 0.3.
\end{align*}
This is the standard conditional probability: $\Pr(A \mid B) = \frac{\Pr(A \wedge B)}{\Pr(B)}$.

Note that ``the weather today'' is itself a random variable.

Let $Z$ be the weather forecast tonight's prediction of the probability, suppose it works as above.
Then $Z$ is a random real variable, defined by:
\begin{align*}
	Z &\colon \Omega \to \RR \\
	Z(\omega) &= \Pr(\text{it rains tomorrow}
			\mid \text{weather today} = \text{weather today}(\omega))
\end{align*}

It would only be reasonable to write
\[ Z = \Pr(\text{it rains tomorrow}\mid \text{weather today}). \]
We're conditioning on a random variable, and $\Pr(\cdots \mid \cdots)$ is itself a random variable
instead of a single value in $\RR$, but that's perfectly okay.

Similarly, if $\Omega$ is finite and every subset of it is measurable,
for random real variables $X$ and $Y$
it would be sensible for us to define random real variable $Z = \EE(X \mid Y)$ by
\begin{align*}
	Z &\colon \Omega \to \RR \\
	Z(\omega) &= \EE[X \mid Y = Y(\omega)].
\end{align*}

\begin{example}
	Let $X$ and $Y$ be the result of rolling two dices. Then:
	\begin{itemize}
		\ii $\EE[X \mid X+Y = 3] = 1.5$, as you can easily calculate.
		\ii $\EE(X \mid X+Y)$ is a random variable, which would takes the value
		$1.5$ in any world whether $X+Y = 3$.
		\ii More generally, we have in fact
		\[ \EE(X \mid X+Y) = \frac{X+Y}{2}. \]
	\end{itemize}
	Notice how the random variable $\EE(X \mid X+Y)$ depends only on the value of $X+Y$ --- by
	definition.
\end{example}

Of course, as we explained earlier, this naive attempts will give us division-by-zero
everywhere for the continuous case --- so, enters the sub-$\sigma$-algebra.

\begin{proposition}
	[Conditional expectation definition]
	\label{prop:conditional_exp}
	Let $X \colon \Omega \to \RR$ be an \emph{absolutely integrable}
	random variable (meaning $\EE[|X|] < \infty$)
	over a probability space $\Omega$,
	and let $\SF$ be a sub-$\sigma$-algebra on it.

	Then there exists a function $\eta \colon \Omega \to \RR$
	satisfying the following two properties:
	\begin{itemize}
		\ii $\eta$ is $\SF$-measurable (that is,
			measurable as a function $(\Omega, \SF, \mu) \to \RR$); and
		\ii for any set $A \in \SF$ we have
		$\EE[\eta \cdot \mathbf{1}_A] = \EE[X \cdot \mathbf{1}_A]$.
	\end{itemize}
	Moreover, this random variable is unique up to almost sureness.
\end{proposition}
\begin{proof}
	Omitted, but relevant buzzword used is ``Radon-Nikodym derivative''.
\end{proof}

\begin{definition}
	Let $\eta$ be as in the previous proposition.
	\begin{itemize}
		\ii We denote $\eta$ by $\EE(X \mid \SF)$
		and call it the \vocab{conditional expectation} of $X$ with respect to $\SF$.
		\ii If $Y$ is a random variable then
		$\EE(X \mid Y)$ denotes $\EE(X \mid \sigma(Y))$,
		i.e.\ the conditional expectation of $X$
		with respect to the $\sigma$-algebra generated by $Y$.
	\end{itemize}
\end{definition}

\begin{example}
	As we can expect, $\eta = \frac{X+Y}{2}$ satisfies the condition of $\EE(X \mid X+Y)$ above.

	The way to motivate doing all this is the following. We want to be able to say
	something like:
	\[ \EE[X \mid X+Y=0.6] = \lim_{\eps \to 0} \EE[X \mid 0.6-\eps < X+Y < 0.6+\eps] \]
	Unfortunately, this setup does not work in general where $\SF$ might not be
	generated by just one random real variable.
	Let's see how the definition above helps us.
	\begin{itemize}
		\ii Let $A = \{\omega \in \Omega \mid 0.6-\eps < X(\omega)+Y(\omega) < 0.6+\eps\}$,
		this set certainly belongs to the sub-$\sigma$-algebra generated by $X+Y$
		(because it is $(X+Y)\pre((0.6-\eps, 0.6+\eps))$).

		\ii Recall that $\eta$ is $\SF$-measurable means $\eta$ only depends on the information in
		$\SF = \sigma(X+Y)$, that is, on $X+Y$. This makes sense.

		\ii Look at the right hand side:
		\[ \EE[X \mid 0.6-\eps < X+Y < 0.6+\eps] = \EE[X \cdot \mathbf{1}_A]. \]

		The law of total expectation says that $\EE[\EE(X \mid Y)] = \EE[X]$.
		So, intuitively, the second property above simply requires
		this law to hold over all set $A \in \SF$.

		In our case, we have the following:
		\[ \EE[\eta \mid 0.6-\eps < X+Y < 0.6+\eps] = \EE[X \mid 0.6-\eps < X+Y < 0.6+\eps]. \]
	\end{itemize}
\end{example}

More fine print:
\begin{remark}
	[This notation is terrible]
	The notation $\EE(X \mid \SF)$ is admittedly confusing,
	since it is actually an entire function $\Omega \to \RR$,
	rather than just a real number like $\EE[X]$ --- though, as you can see, it has its merits.
	For this reason I try to be careful to remember
	to use parentheses rather than square brackets
	for conditional expectations; not everyone does this.
\end{remark}

\begin{abuse}
	In addition, when we write $Y = \EE(X \mid \SF)$,
	there is some abuse of notation happening here
	since $\EE(X \mid \SF)$ is defined only up to some reasonable uniqueness
	(i.e.\ up to measure zero changes).
	So this really means that
	``$Y$ satisfies the hypothesis of \Cref{prop:conditional_exp}'',
	but this is so pedantic that no one bothers.

	For example, in the example above, if we change $\eta$ to be
	\[ \eta(\omega) =
		\begin{cases}
			0 & \text{if }X(\omega)+Y(\omega)=0.6 \\
			\frac{X(\omega)+Y(\omega)}{2} & \text{otherwise}
		\end{cases}
	\]
	then $\EE[\eta \cdot \mathbf{1}_A] = \EE[X \cdot \mathbf{1}_A]$ still holds for every set $A$,
	but now it seems to be saying $\EE[X \mid X+Y=0.6] = 0$?

	Nevertheless, we must agree that we must sacrifice a measure zero set,
	since otherwise if we have
	\[ T(\omega) =
		\begin{cases}
			1&\text{if }Y(\omega) > 0.5\text{ or }(Y(\omega)=0.5\text{ and }X(\omega) \in \QQ) \\
			0&\text{otherwise}
		\end{cases}
	\]
	then it is certainly measurable (i.e. a random variable),
	$\EE[T \mid Y = 0.4] = 0$ and $\EE[T \mid Y = 0.6] = 1$,
	but what is $\EE[T \mid Y = 0.5]$?
	(You may argue it should be $0$, but what if $\QQ$ is changed to something more non-measurable?
	Besides, why should the conditional expectation change when we only modify $T$
	on a probability-zero set anyway?)
\end{abuse}

\todo{properties}

\section{Supermartingales}
\prototype{Visiting a casino is a supermartingale, assuming house odds.}

\begin{definition}
	Let $X_0$, $X_1$, \dots\ be a sequence of random variables
	on a probability space $\Omega$,
	and let $\SF_0 \subseteq \SF_1 \subseteq \cdots$ be a filtration.

	Then $(X_n)_{n \ge 0}$ is a \vocab{supermartingale}
	with respect to $(\SF_n)_{n \ge 0}$ if the following conditions hold:
	\begin{itemize}
		\ii $X_n$ is absolutely integrable for every $n$;
		\ii $X_n$ is measurable with respect to $\SF_n$; and
		\ii for each $n = 1, 2, \dots$ the inequality
		\[ \EE(X_n \mid \SF_{n-1}) \le X_{n-1} \]
		holds for all $\omega \in \Omega$.
	\end{itemize}

	In a \vocab{submartingale} the inequality $\le$ is replaced with $\ge$,
	and in a \vocab{martingale} it is replaced by $=$.
\end{definition}

\todo{what's the etymology of the term?}

\begin{abuse}
	[No one uses that filtration thing anyways]
	We will always take $\SF_n$ to be the $\sigma$-algebra
	generated by the previous variables $X_0$, $X_1$, \dots, $X_{n-1}$,
	and do so without further comment.
	Nonetheless, all the results that follow hold in the more general setting
	of a supermartingale with respect to some filtration.
\end{abuse}

We will prove all our theorems for supermartingales;
the analogous versions for submartingales can be obtained
by replacing $\le$ with $\ge$ everywhere
(since $X_n$ is a martingale iff $-X_n$ is a supermartingale)
and for martingales by replacing $\le$ with $=$ everywhere
(since $X_n$ is a martingale iff it is both a supermartingale
and a submartingale).

Let's give examples.
\begin{example}
	[Supermartingales]
	\label{ex:supermartingales}
	\listhack
	\begin{enumerate}[(a)]
		\ii \textbf{Random walks}:
		an ant starts at the position $0$ on the number line.
		Every minute, it flips a fair coin and either
		walks one step left or one step right.
		If $X_t$ is the position at the $t$th time,
		then $X_t$ is a martingale, because
		\[ \EE(X_t \mid X_0, \dots, X_{t-1})
			= \frac{(X_{t-1}+1) + (X_{t-1}-1)}{2} = X_{t-1}. \]

		\ii \textbf{Casino game}:
		Consider a gambler using the strategy described
		at the beginning of the chapter.
		This is a martingale, since every bet the gambler makes
		has expected value $0$.

		\ii \textbf{Multiplying independent variables}:
		Let $X_1$, $X_2$, \dots, be independent (not necessarily
		identically distributed) integrable random variables with mean $1$.
		Then the sequence $Y_1$, $Y_2$, \dots\ defined by
		\[ Y_n \defeq X_1 X_2 \cdots X_n \]
		is a martingale; as
		$\EE(Y_n \mid Y_1, \dots, Y_{n-1}) = \EE[Y_n] \cdot Y_{n-1} = Y_{n-1}$.

		\ii \textbf{Iterated blackjack}:
		Suppose one shows up to a casino and plays
		infinitely many games of blackjack.
		If $X_t$ is their wealth at time $t$, then $X_t$ is a supermartingale.
		This is because each game has negative expected value (house edge).
	\end{enumerate}
\end{example}

\begin{example}
	[Frivolous/inflamamtory example --- real life is a supermartingale]

	Let $X_t$ be your happiness on day $t$ of your life.
	Life has its ups and downs,
	so it is not the case that $X_t \le X_{t-1}$ for every $t$.
	For example, you might win the lottery one day.

	However, on any given day, many things can go wrong (e.g.\ zombie apocalypse),
	and by Murphy's Law this is more likely than things going well.
	Also, as you get older, you have an increasing number of responsibilities
	and your health gradually begins to deteriorate.

	Thus it seems that
	\[ \EE(X_t \mid X_0, \dots, X_{t-1}) \le X_{t-1} \]
	is a reasonable description of the future ---
	\emph{in expectation}, each successive day is
	slightly worse than the previous one.
	(In particular, if we set $X_t = -\infty$ on death,
	then as long as you have a positive probability of dying,
	the displayed inequality is obviously true.)
\end{example}

Before going on, we will state without proof one useful result:
if a martingale is bounded, then it will almost certainly converge.

\begin{theorem}
	[Doob's martingale convergence theorem]
	\label{thm:doob_martingale_converge}
	Let $X_0$, \dots\ be a supermartingale on a
	probability space $\Omega$ such that
	\[ \sup_{n \in \ZZ_{\ge 0}}
		\EE \left[ \left\lvert X_n \right\rvert \right] < \infty. \]
	Then, there exists a random variable $X_\infty \colon \Omega \to \RR$
	such that
	\[ X_n \asto X_\infty. \]
\end{theorem}

\section{Optional stopping}
\prototype{Las Vegas.}

In the first section we described how to make money almost surely.
The key advantage the gambler had was the ability to quit whenever he wanted
(equivalently, an ability to control the size of the bets;
betting \$0 forever is the same as quitting.)
Let's formalize a notion of stopping time.

The idea is we want to define a function
$\tau \colon \Omega \to \{0, 1, 2, \dots\} \cup \{\infty\}$ such that
\begin{itemize}
	\ii $\tau(\omega)$ specifies the index
	after which we \emph{stop} the martingale.
	Note that the decisions to stop after time $n$
	must be made with only the information available at that time ---
	i.e., with respect to $\SF_n$ of the filtration.

	\ii $X_{\tau \wedge n}$ is the random value representing the
	value at time $n$ of the stopped martingale,
	where if $n$ is \emph{after} the stopping time,
	we just take it to be the our currently value after we leave.

	So for example in a world $\omega$ where we stopped at time $3$, then
	$X_{\tau \wedge 0}(\omega) = X_0(\omega)$,
	$X_{\tau \wedge 1}(\omega) = X_1(\omega)$,
	$X_{\tau \wedge 2}(\omega) = X_2(\omega)$,
	$X_{\tau \wedge 3}(\omega) = X_3(\omega)$, but then
	\[ X_3(\omega)
		= X_{\tau \wedge 4}(\omega)
		= X_{\tau \wedge 5}(\omega)
		= X_{\tau \wedge 6}(\omega)
		= \dots
	\]
	since we have stopped --- the value stops changing.

	\ii $X_{\tau}$ denotes the eventual value after we stop
	(or the limit $X_\infty$ if we never stop).
\end{itemize}

Here's the compiled machine code.
\begin{definition}
	Let $\SF_0 \subseteq \SF_1 \subseteq \cdots$ be a filtration
	on a probability space $\Omega$.
	\begin{itemize}
		\ii A \vocab{stopping time} is a function
		\[ \tau \colon \Omega \to \{0, 1, 2, \dots\} \cup \{\infty\} \]
		with the property that for each integer $n$, the set
		\[ \left\{ \omega \in \Omega \mid \tau(\omega) = n \right\} \]
		is $\SF_n$-measurable (i.e., is in $\SF_n$).

		\ii For each $n \ge 0$ we define
		$X_{\tau \wedge n} \colon \Omega \to \RR$ by
		\[ X_{\tau \wedge n}(\omega)
		= X_{\min \left\{ \tau(\omega), n \right\}}(\omega) \]

		\ii Finally, we let the eventual outcome be denoted by
		\[ X_\tau(\omega)
			= \begin{cases}
				X_{\tau(\omega)}(\omega) & \tau(\omega) \neq \infty \\
				\lim_{n \to \infty} X_n(\omega) & \tau(\omega) = \infty
				\text{ and } \lim_{n \to \infty} X_n(\omega) \text{ exists } \\
				\text{undefined} & \text{otherwise}.
			\end{cases}
		\]
		We require that the ``undefined'' case occurs
		only for a set of measure zero
		(for example, if \Cref{thm:doob_martingale_converge} applies).
		Otherwise we don't allow $X_\tau$ to be defined.
	\end{itemize}
\end{definition}


\begin{proposition}
	[Stopped supermartingales are still supermartingales]
	Let $X_0$, $X_1$, \dots\ be a supermartingale.
	Then the sequence
	\[ X_{\tau \wedge 0}, \; X_{\tau \wedge 1}, \; \dots \]
	is itself a supermartingale.
\end{proposition}
\begin{proof}
	We have almost everywhere the inequalities
	\begin{align*}
		\EE\left( X_{\tau \wedge n} \mid \SF_{n-1} \right)
		&= \EE \left( X_{n-1} + \mathbf 1_{\tau(\omega) = n-1} (X_n - X_{n-1}) \mid \SF_{n-1} \right) \\
		&= \EE \left( X_{n-1} \mid \SF_{n-1} \right)
		+ \EE \left( \mathbf 1_{\tau(\omega) = n-1} \cdot (X_n - X_{n-1}) \mid \SF_{n-1} \right) \\
		&= X_{n-1} + \mathbf 1_{\tau(\omega) = n-1}
			\cdot\EE \left(  X_n - X_{n-1} \mid \SF_{n-1} \right)
		\le X_{n-1}
	\end{align*}
	as functions from $\Omega \to \RR$.
\end{proof}

\begin{theorem}
	[Doob's optional stopping theorem]
	Let $X_0$, $X_1$, \dots\ be a supermartingale on a probability space $\Omega$,
	with respect to a filtration $\SF_0 \subseteq \SF_1 \subseteq \cdots$.
	Let $\tau$ be a stopping time with respect to this filtration.
	Suppose that \emph{any} of the following hypotheses are true,
	for some constant $C$:
	\begin{enumerate}[(a)]
		\ii \textbf{Finite time}: $\tau(\omega) \le C$ for almost all $\omega$.
		\ii \textbf{Finite money}: for each $n \ge 1$,
		$\left\lvert X_{\tau \wedge n}(\omega) \right\rvert \le C$
		for almost all $\omega$.
		\ii \textbf{Finite bets}: we have $\mathbb E[\tau] < \infty$,
		and for each $n \ge 1$, the conditional expectation
		\[ \EE\left( \left\lvert X_n-X_{n-1} \right\rvert
			\mid \SF_n \right) \]
		takes on values at most $C$ for almost all $\omega \in \Omega$
		satisfying $\tau(\omega) \ge n$.
	\end{enumerate}
	Then $X_\tau$ is well-defined almost everywhere,
	and more importantly, \[ \EE[X_\tau] \le \EE[X_0]. \]
\end{theorem}
The last equation can be cheekily expressed as
``the only winning move is not to play''.

\begin{proof}
	\todo{do later tonight}  % *eventually
\end{proof}

\begin{exercise}
	Conclude that going to Las Vegas with the strategy
	described in the first section is a really bad idea.
	What goes wrong?
\end{exercise}

While this is useful to make us stop gambling, it doesn't allow us to compute anything ---
we don't know anything about $\EE[X_\tau]$ other than it's $\leq \EE[X_0]$.
However:
\begin{corollary}
	With the same hypothesis as above:
	\begin{itemize}
		\ii If $X_0$, $X_1$, \dots\ is a submartingale, then $\EE[X_\tau] \geq \EE[X_0]$.
		\ii If it is a martingale, then $\EE[X_\tau] = \EE[X_0]$.
	\end{itemize}
\end{corollary}
\begin{proof}
	If $X_0$, $X_1$, \dots\ is a submartingale, then $Y_0$, $Y_1$, \dots\ defined by $Y_i = -X_i$ is
	a supermartingale, and the hypothesis is still satisfied. Apply the theorem to $Y_\tau$ we
	get the result.

	If $X_0$, $X_1$, \dots\ is a martingale, then it is both a supermartingale and a submartingale,
	the result follows immediately.
\end{proof}
This finally let us calculate something --- if we can compute $\EE[X_0]$ and write the result as
$\EE[X_\tau]$ for some martingale, then we can solve the problem!

\section{Fun applications of optional stopping (TO DO)}
%% for 18.A34 we can do the random walk example
We now give three problems which showcase some of the power of
the results we have developed so far.

\subsection{The ballot problem}
Suppose Alice and Bob are racing in an election;
Alice received $a$ votes total while Bob received $b$ votes total, and $a > b$.
If the votes are chosen in random order,
one could ask: what is the probability that Alice remains strictly ahead of
Bob in the election?

\missingfigure{path}

\begin{proposition}
	[Ballot problem]
	This occurs with probability $\frac{a-b}{a+b}$.
\end{proposition}

We should try to model this as a martingale.
A natural way to do it is the random walk, as in \Cref{ex:supermartingales}:
\begin{align*}
	X_0 &= 0 \\
	X_i &= X_{i-1} +
	\begin{cases}
		1\text{ with probability }\frac{1}{2} \\
		-1\text{ otherwise.}
	\end{cases}
\end{align*}
Here, each $1$ represents Alice getting a vote, and each $-1$ represents Bob getting a vote.

Then, we need to compute
\[ \Pr[X_i > 0\text{ for all }1 \leq i \leq a+b \mid X_{a+b} = a-b]. \]

While this is natural, the fact that the probability is conditioned on $X_{a+b}=a-b$ makes us unable
to apply the optional stopping theorem.

Instead, the following will work:
We start with all the votes, and \emph{remove} them in random order.
\begin{align*}
	(A_0, B_0) &= (a, b), \\
	(A_i, B_i) &=
	\begin{cases}
		(A_{i-1}-1, B_{i-1})\text{ with probability }\frac{A_{i-1}}{A_{i-1}+B_{i-1}} \\
		(A_{i-1}, B_{i-1}-1)\text{ otherwise}
	\end{cases}
	\text{ for }1 \leq i \leq a+b.
\end{align*}
Of course, here $A_i$ represents the number of votes Alice has left,
and $B_i$ represents the number of votes Bob has left.

Now we just need to compute
\[ \Pr[A_i>B_i\text{ for all }0 \leq i \leq a+b-1]. \]
There's no longer any conditional expectation!

Next, we need to construct a martingale.
We could try to define $X_i = A_i-B_i$ similar to above, but that will not be a martingale.

Here is the trick: we \emph{modify} $X_i$ to make it a martingale. (More examples where a sequence
of random variables is modified to create a martingale can be found in \Cref{exer:martingale}.)

\begin{example}
	Suppose $a=2$ and $b=1$. Then:
	\begin{align*}
		(A_0, B_0) &= (2, 1), \\
		(A_1, B_1) &= \begin{cases}
			(1, 1)\text{ with probability }\frac{2}{3} \\
			(2, 0)\text{ otherwise.}
		\end{cases}
	\end{align*}
	So $\EE[A_0-B_0]=1$ while $\EE[A_1-B_1]=\frac{2}{3}\cdot 0+\frac{1}{3}\cdot 2=\frac{2}{3}$,
	if we define $X_i$ as above of course it wouldn't be a martingale.

	However, it's not difficult to find ways to modify it to form a martingale. For example:
	\begin{itemize}
		\ii If we define $X_1 = A_1-B_1+\frac{1}{3}$, then $\EE[X_1]=1$, so we're fine.
		\ii Similarly, we can also define $X_1 = \frac{3}{2}\cdot(A_1-B_1)$.
		\ii Or $X_1 = \frac{9}{4}\cdot (A_1-B_1)^2$.
		\ii \dots\ et cetera\dots
	\end{itemize}
\end{example}

We need to make $\EE[X_i \mid X_0=x_0, X_1=x_1, \dots, X_{i-1}=x_{i-1}]=x_{i-1}$
for \emph{every} $i$ and every $(x_0, x_1, \dots, x_{i-1})$. (You can try to find out yourself what
modification will work yourself before continue reading.)

Turns out, the following will work:
\[
	X_i = \frac{A_i-B_i}{a+b-i}\text{ for }0 \leq i \leq a+b-1.
\]
We cannot extend it to $i \geq a+b$, but it is fine.
(If you're worried, just define $X_i = X_{a+b-1}$ for $i \geq a+b$.)
\begin{exercise}
	Check that it works. (The math is very similar to the problem about P\'{o}lya's urn in
	\Cref{exer:martingale}.)
\end{exercise}
For the stopping time, there is only one natural way to define it:
$\tau$ is $a+b-1$ if Alice remains ahead of Bob i.e. $X_i>0$ for every $0 \leq i \leq a+b-1$,
otherwise $\tau$ is the smallest $i$ such that $X_i=0$.

Then, the optional stopping theorem states:
\[ \EE[X_\tau] = \EE[X_0]. \]
$\EE[X_0]$ is easy to calculate, it is $\frac{A_0-B_0}{a+b}=\frac{a-b}{a+b}$.
What is $\EE[X_\tau]$?
\begin{itemize}
	\ii If Alice remains ahead of Bob, $X_\tau = X_{a+b-1} = 1$.
	\ii Otherwise, $X_\tau = 0$.
\end{itemize}
Therefore $\EE[X_\tau]$ is exactly the probability we need to calculate, so we're done.

\begin{remark}[This is cheating a little]
	Note that you can equivalently write
	\[ X_i = \frac{A_i-B_i}{A_i+B_i}. \]
	Which is exactly the form of the answer.

	That is to say, if you already know the form of the answer, martingale theory can help you to
	check it. But if you don't\dots
\end{remark}

\subsection{ABRACADABRA}
To be written.

\url{https://www.jeremykun.com/2014/03/03/martingales-and-the-optional-stopping-theorem/}


\subsection{USA TST 2018}
To be written.

\url{https://aops.com/community/p9513099}

\section{\problemhead}

\begin{problem}
	[Examples of martingales]
	\label{exer:martingale}
	We give some more examples of martingales.
	\begin{enumerate}[(a)]
		\ii \textbf{(Simple random walk)}
		Let $X_1$, $X_2$, \dots\ be i.i.d.\ random variables
		which equal $+1$ with probability $1/2$,
		and $-1$ with probability $1/2$.
		Prove that
		\[ Y_n = \left( X_1 + \dots + X_n \right)^2 - n \]
		is a martingale.

		\ii \textbf{(de Moivre's martingale)}
		Fix real numbers $p$ and $q$ such that $p,q > 0$ and $p+q=1$.
		Let $X_1$, $X_2$, \dots\ be i.i.d.\ random variables
		which equal $+1$ with probability $p$,
		and $-1$ with probability $q$.
		Show that
		\[ Y_n = \left(qp^{-1}\right)^{X_1 + X_2 + \dots + X_n} \]
		is a martingale.

		\ii \textbf{(P\'{o}lya's urn)}
		An urn contains one red and one blue marble initially.
		Every minute, a marble is randomly removed from the urn,
		and two more marbles of the same color are added to the urn.
		Thus after $n$ minutes, the urn will have $n+2$ marbles.

		Let $r_n$ denote the fraction of marbles which are red.
		Show that $r_n$ is a martingale.
	\end{enumerate}
\end{problem}

\begin{problem}
	A deck has $52$ cards; of them $26$ are red and $26$ are black.
	The cards are drawn and revealed one at a time.
	At any point, if there is at least one card remaining in the deck,
	you may stop the dealer;
	you win if (and only if) the next card in the deck is red.
	If all cards are dealt, then you lose.
	Across all possible strategies,
	determine the maximal probability of winning.
	\begin{hint}
		There is a cute elementary solution.
		For the martingale-based solution,
		show that the fraction of red cards in the deck at time $n$ is a martingale.
	\end{hint}
\end{problem}

\begin{problem}
	[Wald's identity]
	Let $\mu$ be a real number.
	Let $X_1$, $X_2$, \dots\ be independent random variables
	on a probability space $\Omega$ with mean $\mu$.
	Finally let $\tau \colon \Omega \to \{1, 2, \dots\}$
	be a stopping time such that $\mathbb E[\tau] < \infty$,
	and the event $\tau = n$ depends only on $X_1$, \dots, $X_n$.

	Prove that
	\[ \EE[X_1 + X_2 + \dots + X_\tau] = \mu \EE[\tau]. \]
\end{problem}

\begin{problem}
	[Unbiased drunkard's walk]
	An ant starts at $0$ on a number line,
	and walks left or right one unit with probability $1/2$.
	It stops once it reaches either $-17$ or $+8$.
	\begin{enumerate}[(a)]
		\ii Find the probability it reaches $+8$ before $-17$.

		\ii Find the expected value of the amount of time
		it takes to reach either endpoint.
	\end{enumerate}
\end{problem}

\begin{problem}
	[Biased drunkard's walk]
	Let $0 < p < 1$ be a real number.
	An ant starts at $0$ on a number line,
	and walks left or right one unit with probability $p$.
	It stops once it reaches either $-17$ or $+8$.
	Find the probability it reaches $+8$ first.
	\begin{hint}
		Use \Cref{exer:martingale}.
	\end{hint}
\end{problem}

\begin{problem}
	The number $1$ is written on a blackboard.
	Every minute, if the number $a$ is written on the board,
	it's erased and replaced by a real number
	in the interval $[0, 2.01a]$ selected uniformly at random.
	What is the probability that the resulting sequence of numbers approaches $0$?
	\begin{hint}
		It occurs with probability $1$.
		If $X_n$ is the number on the board at step $n$,
		and $\mu = \frac{1}{2.01} \int_0^{2.01} \log t \; dt$,
		show that $\log(X_n) - n \mu$ is a martingale.
		(Incidentally, using the law of large numbers could work too.)
	\end{hint}
\end{problem}
